Skip to content

Dispatch for drawing multiples #1985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

mmikhasenko
Copy link

@mmikhasenko mmikhasenko commented Jun 17, 2025

Closes #1984

Implementation

  • Dispatch for MixtureModel
  • Dispatch for Truncated

Checkpoints

Sanity checks

Speed

For mixture model,

julia> using Distributions
julia> using BenchmarkTools

julia> d = MixtureModel([Normal(1,0.1), Normal(2,0.1), Normal(3,0.1)], [0.3,0.3,0.4])
julia> @btime rand($d, 10_000);
  73.708 μs (7 allocations: 125.67 KiB)

julia> @btime [rand($d) for _ in 1:10_000];
  124.125 μs (3 allocations: 96.06 KiB)

For truncated

t = truncated(Normal(), 0, 3)
@btime rand($t, 10_000);  # 114.625 μs (6 allocations: 256.12 KiB)
@btime [rand($t) for _ in 1:10_000];  # 169.875 μs (3 allocations: 96.06 KiB)

Visual

d = MixtureModel([Normal(1,0.1), Normal(2,0.1), Normal(3,0.1)], [0.3,0.3,0.4])
s1 = [rand(d) for _ in 1:10000]
s2 = rand(d, 10000)
let
    plot(); 
    stephist!(s1, bins=range(0, 4, 100), lab="rand(d) n times")
    stephist!(s2, bins=range(0, 4, 100), lab="rand(d,n)")
end

image

t = truncated(Normal(), 0, 3)
r1 = [rand(t) for _ in 1:10000]
r2 = rand(t, 10000)
let
    plot(); 
    stephist!(t1, bins=range(0, 4, 100), lab="rand(d) n times")
    stephist!(t2, bins=range(0, 4, 100), lab="rand(d,n)")
end

image

@mmikhasenko mmikhasenko marked this pull request as draft June 17, 2025 20:52
@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.26%. Comparing base (abb151c) to head (97e3b25).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1985      +/-   ##
==========================================
- Coverage   86.28%   86.26%   -0.03%     
==========================================
  Files         146      146              
  Lines        8787     8847      +60     
==========================================
+ Hits         7582     7632      +50     
- Misses       1205     1215      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
mmikhasenko and others added 5 commits June 18, 2025 16:00
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@mmikhasenko
Copy link
Author

mmikhasenko commented Jun 18, 2025

@devmotion thanks a lot for the review.

  • Questions should be addressed.
  • truncated of small mass: I've fixed broken pipeline for large truncated of small remaining volume - now, it falls back to rand(n) if needs to generate more than 10M, this cut value is somewhat arbitrary, so suggestions are welcome.
  • docstrings are removed.
  • made sanity checks of comparing samples of rand(d) and rand(d,n)

No docs are added

@mmikhasenko
Copy link
Author

The last item that I think about is the 10M cap for switching between algorithm.
It's bad to switch between algorithms in general for predictability, so the design of Truncated with a switch is not ideal.
But we will not address it in this MR.

I'm thinking to proceed with a similar if/else of 0.25% of distribution. Similar (not-ideal) patterns will be easy to identify in feature and update.

@mmikhasenko
Copy link
Author

@devmotion thanks for the last comments.

  • the truncated was easy to fix,
  • I've modified the code rand + resize strategy for the mixture model

@mmikhasenko mmikhasenko marked this pull request as ready for review June 23, 2025 19:31
@mmikhasenko
Copy link
Author

Re-evaluated benchmarks in the header,
the rand-n version is faster for both models

@mmikhasenko
Copy link
Author

Comments are resolved, this PR is ready for review

@devmotion
Copy link
Member

As a general comment before any detailed review:

the rand-n version is faster for both models

Can we test different number of samples? n = 10000 is a bit extreme. Can we check n = 1, n = 5, n = 10, n = 50, n = 100, n = 500, n = 1000 etc. as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rand(d, 1000) is not propagated to components
3 participants